Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values
The Arab world did not play a key role in the PC, internet and mobile eras. In the AI era, it will be different. (Shutterstock)
Short Url

Advances in the large language models that underpin generative AI are changing everything, from medicine and education to entertainment.

Our relationship with technology is becoming more intimate as machines change from passive tools into active assistants that amplify our innate human abilities.

This new era poses both a challenge and an opportunity for the Middle East.

The challenge is that leaders in this new field, like OpenAI’s ChatGPT and Google’s Gemini, come from Silicon Valley, or from China, where my team at 01.AI has built models that rival the Americans. In Europe, too, startups such as France’s Mistral have entered the race.

The opportunity is for the Middle East to join this league and make sure its voice is heard.

Inspired by my latest trip to Riyadh, I decided to test how the current crop of AI models would handle a simple request. I imagined myself as a young Saudi getting ready to host a dinner party and asked ChatGPT to prepare a menu.

The food it recommended sounded delicious — stuffed grape leaves, tabouleh salad, mandi and stuffed dates. But the beverages were a problem.

Aside from drinks such as mint lemonade and jallab, a mixture of dates, grape molasses and rose water, ChatGPT also offered this: “For alcoholic beverages, you could offer a selection of international wines, beers, or non-alcoholic mocktails.”

To its credit, when I repeated the question, it offered only non-alcoholic drinks.

If a model recommends breaking both the law and cultural norms, imagine how it might answer other more sensitive questions about politics or religion? Indeed, researchers have even shown that some models have exhibited an anti-Muslim bias.

My modest test underlines the urgent need to develop an Arabic large language model that reflects local values.

The first step to building this is creating enough high-quality Arabic digitized data to properly train a new generation of models.

Although there are 400 million Arabic speakers, only an estimated 2 percent of online content is in Arabic. Meta’s open source LLM model Llama is overwhelmingly trained on English data, with Arabic comprising less than 0.1 percent of the data.

The lack of data naturally skews the results. To fix this dearth of data, either a visionary entrepreneur or a government-backed organization should collect, digitize and convert the many Arabic books into training data for Arabic models.

Once the data is gathered, it can be fed into the breakthrough pre-training process, which reads trillions of words and creates its own virtual concept space or model of the world. This concept space has been shown to be mostly in English and Chinese.

Adding a sizable number of texts in Arabic, which has enormous cultural output and significance, will make the concept space more knowledgeable about Arabic and more balanced in its concepts and views.

After such pre-training, the model needs to be fine-tuned by data and labels from the Arab world, which will align with the values of the region. Those are different from American models, which are aligned to US values, and Chinese models, which reflect Chinese values.

The collection of alignment data, the coordination of human labeling and the alignment process will need to be done in-region by AI experts.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Kai-fu Lee

Finally, safety modules will need to be added to ensure legal compliance and to avoid harm. These will also need to be developed locally.

The above steps will create localized, sovereign models that will reflect the traditions of the Middle East. Privately developed or government-backed, it could be the foundation for a new wave of Arabic AI innovation.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Imagine an AI tool that could find, summarize, organize and write insightful content, an AI teacher that makes learning fun and customized, an AI doctor that is more knowledgeable than any human, an AI engineer that can write software and applications, and an AI assistant that knows its owner better than the owner themselves.

The Arab world did not play a leading role in the PC, internet and mobile eras. In the AI era, it will be different.

This transformation is by no means an easy feat. It will require an unprecedented investment of money, energy and human capital.

Middle Eastern leaders like Saudi Crown Prince Mohammed bin Salman and others have shown that they have the vision, determination and resources to lead their countries into the future.

Standing on my hotel balcony in Jeddah recently, overlooking the King Abdullah University of Science and Technology, I saw part of that vision coming to fruition.

Universities such as KAUST and the Mohamed bin Zayed University of Artificial Intelligence in the UAE are striking examples of the resources that have already been poured into this transformation.

These world-class academic institutions can attract and retain the best top tier global talent.  It is especially important to bring in the world’s best computer engineers to help fulfill this vision of the future AI.

Our team at 01.AI has shown what a group of talented and motivated computer scientists can achieve in just one year. With the right commitment of resources and drawing upon the best talent, countries like Saudi Arabia can easily catch up with their global peers.

The Middle East can also lead the world in the use of renewables to run power-hungry generative AI models.

As it seeks to diversify its economy, Saudi Arabia is actively promoting the use of alternative energy sources such as solar, which could power server farms and reduce their carbon footprint — a growing concern as AI becomes more widespread.

It may take time for countries to figure out their strategy for building a sovereign AI. But it is critical for the Arab world to quickly catalyze the creation of culturally appropriate LLMs and build a rich ecosystem to allow AI-powered Arabic apps to blossom.

A recent encounter with a female sales assistant at a computer store in Riyadh served as an apt reminder of what is at stake. Dressed in jeans and sporting a tattoo, she was a reminder of the transformative changes that the country is undergoing.

Where are you from, I asked. “I’m Saudi,” she said. “One day I want to be Saudi Arabia’s Elon Musk.” I hope on my next visit she will pitch me a homegrown AI app.

Kai-Fu Lee is a computer scientist, CEO of 01.AI, chairman of Sinovation Ventures, former president of Google China, and author of “AI 2041” and “AI Superpowers”
 

Disclaimer: Views expressed by writers in this section are their own and do not necessarily reflect Arab News' point of view

Lebanese boy chokes to death at school attempting viral TikTok ‘one-bite challenge’

Lebanese boy chokes to death at school attempting viral TikTok ‘one-bite challenge’
Updated 7 min 52 sec ago
Follow

Lebanese boy chokes to death at school attempting viral TikTok ‘one-bite challenge’

Lebanese boy chokes to death at school attempting viral TikTok ‘one-bite challenge’
  • 12-year old dies trying to eat croissant in a single bite, inspired by online videos of people eating various types of food in one go
  • He attempted the challenge at Jannat Al-Atfal School

BEIRUT: A Lebanese schoolboy choked to death at school while attempting a food challenge that has gone viral on video-hosting platform TikTok.
Lebanese media reported on Friday that 12-year-old Joe Skaff died as he tried to eat a croissant in a single bite. He was said to have been inspired by the online “one-bite challenge” in which people post videos of themselves cramming various foods into their mouths to eat them in one go.
He attempted the challenge at Jannat Al-Atfal School in Keserwan, north of Beirut but began to choke on the pastry and was unable to breathe.
The school said: “With hearts filled with grief and sorrow, we mourn the death of our dear son and sixth-grader Joe Skaff. Today, during the first break, Joe was exposed to a sudden tragic accident where he suffocated while eating.”
Teachers and a licensed school nurse tried to help the youngster and clear the blockage before an ambulance arrived to take him to hospital but “attempts to save him were unsuccessful.”
The school added: “Joe was a special child with a bright personality and great kindness, and he was loved by his peers and all members of our school community.”


Ikmah, Richard Mille face off for Richard Mille Desert Polo Final in AlUla

Ikmah, Richard Mille face off for Richard Mille Desert Polo Final in AlUla
Updated 48 min 7 sec ago
Follow

Ikmah, Richard Mille face off for Richard Mille Desert Polo Final in AlUla

Ikmah, Richard Mille face off for Richard Mille Desert Polo Final in AlUla
  • Ikmah showed their real strength and dominance of Group 1 after two victories against Dadan 4-2 and AlUla team 5-4
  • Richard Mille topped Group 2 after defeating Budget 3-0 and drawing 2-2 with Bentley

ALULA: Ikmah and Richard Mille have qualified to the final of the Richard Mille Desert Polo Championship 2025, and will face off in the final showdown that AlUla’s Al Fursan Village hosts on Saturday.
The fourth edition of the prestigious competition’s finale takes place at 4 p.m. Saudi time.
Ikmah showed their real strength and dominance of Group 1 after two victories against Dadan 4-2 and AlUla team 5-4.
Meanwhile, Richard Mille topped Group 2 after defeating Budget 3-0 and drawing 2-2 with Bentley.
The tournament’s Friday matches saw intense competition in the presence of distinguished figures, including Piero Dillier, president of the International Polo Federation, Amr Zidan, president of the Saudi Polo Federation, and Ziyad Al-Suhaibani, head of sports at the Royal Commission for AlUla.
Meanwhile, AlUla battles against Bentley for third place, and Dadan faces Budget to determine the fifth and sixth positions before the final.
Speaking about the event’s significance, Faisal bin Duwies, the Saudi Polo Federation’s CEO,” expressed pride in Saudi Arabia’s growing influence in the global sports arena.
“We take pride in seeing the Kingdom lead the global sports scene through prestigious championships.”
He emphasized the tournament’s role in fostering talent, highlighting the participation of polo legends from around the world, which enriches the experience for all competing players.
As anticipation builds, the final promises to be a thrilling display of skill and strategy, further cementing Saudi Arabia’s position as a premier destination for world-class polo.
The championship is being organized in coordination with the Royal Commission for AlUla.


Red Sea trade route will remain too risky: industry executives

Red Sea trade route will remain too risky: industry executives
Updated 29 min 22 sec ago
Follow

Red Sea trade route will remain too risky: industry executives

Red Sea trade route will remain too risky: industry executives
  • The leader of Houthis said on Thursday that the group would monitor the implementation of a ceasefire deal between Israel and Hamas aimed at ending the war in Gaza and continue its attacks on vessels or Israel if it is breached

LONDON: Companies transporting their products around the world are not ready to return to the Red Sea trade route in the wake of a Gaza ceasefire deal because of uncertainty over whether Houthis will continue to attack shipping, industry executives said.

The EU’s naval force in the Red Sea said its “threat assessment remains unchanged.”

The leader of Houthis said on Thursday that the group would monitor the implementation of a ceasefire deal between Israel and Hamas aimed at ending the war in Gaza and continue its attacks on vessels or Israel if it is breached.

Executives from shipping, insurance, and retail industries said the risks remained too high to resume voyages through the Bab Al-Mandab Strait in the Red Sea, through which exports to Western markets from the Gulf and Asia must pass before entering the Suez Canal.

“There is no way I’m putting any of my merchandise on a boat that’s going to go through the Red Sea for some time to come,” said Jay Foreman, CEO of US-based Basic Fun, which supplies toys to major US retailers like Walmart and Amazon.com. Matt Castle, vice president of global forwarding with logistics group C.H. Robinson, said: “It’s not likely the industry will see a large shift back to the Suez Canal in the short term.”

He said this was due to the challenges of securing cargo insur- ance given perceived high risks and time constraints, as imple- menting a new ocean shipping plan would take weeks or months.

 


Avishka Fernando’s record-breaking knock leads Sharjah Warriorz to victory in ILT20

Avishka Fernando’s record-breaking knock leads Sharjah Warriorz to victory in ILT20
Updated 17 January 2025
Follow

Avishka Fernando’s record-breaking knock leads Sharjah Warriorz to victory in ILT20

Avishka Fernando’s record-breaking knock leads Sharjah Warriorz to victory in ILT20
  • He smashes the fastest 50 in tournament history, reaching the milestone in just 16 balls on his way to a total of 81 from 27 balls
  • Dubai Capitals set a daunting target of 202, led by Shai Hope’s 83 off 49 balls, but the Warriorz bats let loose to reach it in only 18.1 overs

DUBAI: Avishka Fernando produced a stunning display of power-hitting to guide the Sharjah Warriorz to a thrilling five-wicket victory over the Dubai Capitals in the DP World International League T20 on Friday.

Fernando smashed the fastest half-century in the tournament’s history, reaching the milestone in just 16 balls, as the Warriorz chased a daunting target of 202 at the Sharjah Cricket Stadium. His explosive 81 off 27 balls, featuring eight sixes and six fours, anchored the chase, which was completed in 18.1 overs.

“I wanted to play my natural game and I thank God for how things turned out, he said. “The wicket was great to bat on, and when they scored 200 I knew I had to play positively. It came naturally, and I’m grateful for that.

“I didn’t realize I had broken the record for the fastest 50 but I’m happy with the performance.”

The Capitals had posted an imposing 201/5, led by Shai Hope’s 83 off 49 balls. Captain Sikandar Raza praised his team’s batting but acknowledged the challenge created by late-game conditions.

“I thought it was a very good total. You don’t usually see 202 in Sharjah, so I was pretty confident it was a strong score, and we should have been able to win the game.

“Shai has been a fantastic addition to the team and hopefully the other players around him step up as well. The wicket got better later and the dew made it tougher. The cutters weren’t holding in the wicket either. Our fielding could have been better and Avishka played brilliantly.”

Johnson Charles (37 runs off 19 balls) and Jason Roy (26 off 21) provided the Warriorz with a strong start, while Luke Wells helped complete the job with a composed 31 not out from 17 balls.

The Capitals’ bowlers struggled under the batting onslaught, with Gulbadin Naib conceding 27 runs in a single over, the most in the tournament’s history.

The victory highlighted the strength of the Warriorz, in particular their explosive batting lineup, with Fernando’s heroics setting the tone for what already promises to be an exciting season of cricket.


Diriyah Season launches At-Turaif District activities

Diriyah Season launches At-Turaif District activities
Updated 17 January 2025
Follow

Diriyah Season launches At-Turaif District activities

Diriyah Season launches At-Turaif District activities
  • Diriyah Season 2024/25 began on Dec. 20 last year, under the theme “Entertain Your Cultural Curiosity”

DIRIYAH, Riyadh: Diriyah Season on Friday launched a new program of activities at the At-Turaif District, a UNESCO World Heritage site, offering visitors a range of cultural and historical experiences aimed at exploring Saudi Arabia’s heritage.

The program highlights key moments from the history of the first Saudi state through workshops, theatrical performances, and cultural sessions.

Visitors will engage in activities such as traditional Najdi design workshops, poetry recitations, and demonstrations of the Saudi Arda, the Kingdom’s traditional sword dance.

This year’s program also features the updated 850 Interactive Experience, named after the establishment of Diriyah in the Islamic calendar year 850 AH (1446 or 1447 AD in the Gregorian calendar).

The experience uses storytelling and puzzles to explore notable historical figures, including Imam Turki bin Abdullah, and key events from Diriyah’s history.

Workshops in traditional construction techniques, historical games, and exhibitions showcasing the district’s architecture are designed to deepen understanding of the region’s cultural significance, while live performances and discussions will further connect audiences with the stories of At-Turaif’s past, including its role in Saudi Arabia’s early statehood.

Diriyah Season 2024/25 began on Dec. 20 last year, under the theme “Entertain Your Cultural Curiosity.” The event aims to promote Diriyah as a cultural hub, combining historical education with live entertainment and creative showcases.